172 research outputs found

    Design Optimization of Tank Track Pad Meta-Material Using the Cell Synthesis Method

    Get PDF
    The elastomeric backer pad on the M1 Abrams tank track experiences highly cyclic and dynamic loads during normal operating conditions. As a result, extensive heat is generated within the pad due to its viscoelastic hysteretic nature which leads to its early failure. Research has been carried out in the past at Clemson University to design a meta-material that will mimic the deformation behavior of the elastomeric backer pad but will be made out of a linearly elastic constitutive material to eliminate hysteresis. A meta-material in this context is an artificial material in the form of a periodic structure that exhibits effective properties that differ from its constitutive material. Previous attempts to design a feasible meta-material as an effective replacement to the existing elastomeric backer pad have been unsuccessful. The work carried out in this research therefore, is focused on developing a meta-material that satisfies all the application specific requirements. The meta-material is designed based on the steps prescribed by the Unit Cell Synthesis Method which was developed in previous research. Using this method, a unit cell based periodic meta-material can be designed that exhibits nonlinear deformation behavior by implementing various combinations of different elemental geometries that show geometric nonlinearity under deformation. The idea is to attain a targeted nonlinear deformation response of the meta-material structure by tuning the geometric nonlinearities of one or multiple entities in order to replace the material nonlinearity of the target material. A modification is proposed to the original method to make it more efficient by introducing a multi-objective optimization step that considers all the relevant feasibility criteria concerning the meta-material design. Two unit cell based meta-material concepts are evaluated and a best meta-material design is chosen based on the results obtained from the multi-objective optimization problem. The optimized meta-material is then subjected to dynamic tank wheel roll-over conditions to compare its deformation response with that of the original pad. Finally, conclusions are drawn and scope for future work is discussed

    Scheduling Transformation and Dependence Tests for Recursive Programs

    Get PDF
    Scheduling transformations reorder the execution of operations in a program to improve locality and/or parallelism. The polyhedral model provides a general framework for performing instance-wise scheduling transformations for regular programs, reordering the iterations of loops that operate over dense arrays through transformations like tiling. There is no analogous framework for recursive programs—despite recent interest in optimizations like tiling and fusion for recursive applications. This paper presents PolyRec, the first general framework for applying scheduling transformations—like inlining, interchange, and code motion—to nested recursive programs and reasoning about their correctness. We describe the phases of PolyRec—representing dynamic instances, applying transformations, reasoning about correctness—and show that PolyRec is able to apply sophisticated, composed transformations to complex, nested recursive programs and improve performance through enhanced locality

    RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search

    Full text link
    The problem of identifying the k-Nearest Neighbors (kNNS) of a point has proven to be very useful both as a standalone application and as a subroutine in larger applications. Given its far-reaching applicability in areas such as machine learning and point clouds, extensive research has gone into leveraging GPU acceleration to solve this problem. Recent work has shown that using Ray Tracing cores in recent GPUs to accelerate kNNS is much more efficient compared to traditional acceleration using shader cores. However, the existing translation of kNNS to a ray tracing problem imposes a constraint on the search space for neighbors. Due to this, we can only use RT cores to accelerate fixed-radius kNNS, which requires the user to set a search radius a priori and hence can miss neighbors. In this work, we propose TrueKNN, the first unbounded RT-accelerated neighbor search. TrueKNN adopts an iterative approach where we incrementally grow the search space until all points have found their k neighbors. We show that our approach is orders of magnitude faster than existing approaches and can even be used to accelerate fixed-radius neighbor searches.Comment: This paper has been accepted at the International Conference on Supercomputing 2023 (ICS'23

    D2P: Automatically Creating Distributed Dynamic Programming Codes

    Get PDF
    Dynamic Programming (DP) algorithms are common targets for parallelization, and, as these algorithms are applied to larger inputs, distributed implementations become necessary. However, creating distributed-memory solutions involves the challenges of task creation, program and data partitioning, communication optimization, and task scheduling. In this paper we present D2P, an end-to-end system for automatically transforming a specification of any recursive DP algorithm into distributed-memory implementation of the algorithm. When given a pseudo-code of a recursive DP algorithm, D2P automatically generates the corresponding MPI-based implementation. Our evaluation of the generated distributed implementations shows that they are efficient and scalable. Moreover, D2P-generated implementations are faster than implementations generated by recent general distributed DP frameworks, and are competitive with (and often faster than) hand-written implementations

    Generalized Neighbor Search using Commodity Hardware Acceleration

    Full text link
    Tree-based Nearest Neighbor Search (NNS) is hard to parallelize on GPUs. However, newer Nvidia GPUs are equipped with Ray Tracing (RT) cores that can build a spatial tree called Bounding Volume Hierarchy (BVH) to accelerate graphics rendering. Recent work proposed using RT cores to implement NNS, but they all have a hardware-imposed constraint on the type of distance metric, which is the Euclidean distance. We propose and implement two approaches for generalized distance computations: filter-refine, and monotone transformation, each of which allows non-euclidean nearest neighbor queries to be performed in terms of Euclidean distances. We find that our reductions improve the time taken to perform distance computations during the search, thereby improving the overall performance of the NNS

    Tribochemical investigation of microelectronic materials

    Get PDF
    To achieve efficient planarization with reduced device dimensions in integrated circuits, a better understanding of the physics, chemistry, and the complex interplay involved in chemical mechanical planarization (CMP) is needed. The CMP process takes place at the interface of the pad and wafer in the presence of the fluid slurry medium. The hardness of Cu is significantly less than the slurry abrasive particles which are usually alumina or silica. It has been accepted that a surface layer can protect the Cu surface from scratching during CMP. Four competing mechanisms in materials removal have been reported: the chemical dissolution of Cu, the mechanical removal through slurry abrasives, the formation of thin layer of Cu oxide and the sweeping surface material by slurry flow. Despite the previous investigation of Cu removal, the electrochemical properties of Cu surface layer is yet to be understood. The motivation of this research was to understand the fundamental aspects of removal mechanisms in terms of electrochemical interactions, chemical dissolution, mechanical wear, and factors affecting planarization. Since one of the major requirements in CMP is to have a high surface finish, i.e., low surface roughness, optimization of the surface finish in reference to various parameters was emphasized. Three approaches were used in this research: in situ measurement of material removal, exploration of the electropotential activation and passivation at the copper surface and modeling of the synergistic electrochemical-mechanical interactions on the copper surface. In this research, copper polishing experiments were conducted using a table top tribometer. A potentiostat was coupled with this tribometer. This combination enabled the evaluation of important variables such as applied pressure, polishing speed, slurry chemistry, pH, materials, and applied DC potential. Experiments were designed to understand the combined and individual effect of electrochemical interactions as well as mechanical impact during polishing. Extensive surface characterization was performed with AFM, SEM, TEM and XPS. An innovative method for direct material removal measurement on the nanometer scale was developed and used. Experimental observations were compared with the theoretically calculated material removal rate values. The synergistic effect of all of the components of the process, which result in a better quality surface finish was quantitatively evaluated for the first time. Impressed potential during CMP proved to be a controlling parameter in the material removal mechanism. Using the experimental results, a model was developed, which provided a practical insight into the CMP process. The research is expected to help with electrochemical material removal in copper planarization with low-k dielectrics

    Efficient GPU Tree Walks for Effective Distributed N-Body Simulations

    Get PDF
    N-body problems, such as simulating the motion of stars in a galaxy, are popularly solved using tree codes like Barnes-Hut. ChaNGa is a best-of-breed n-body platform that uses an asymptotically-efficient tree traversal strategy known as a dual-tree walk to quickly determine which bodies need to interact with each other to provide an accurate simulation result. However, this strategy does not work well on GPUs, due to the highly-irregular nature of the dual-tree algorithm. On GPUs, ChaNGa uses a hybrid strategy where the CPU performs the tree walk to determine which bodies interact while the GPU performs the force computation. In this paper, we show that a highly-optimized single-tree walk approach is able to achieve better GPU performance by significantly accelerating the tree walk and reducing CPU/GPU communication. Our experiments show that this new design can achieve a 8.25Ă— speedup over baseline ChaNGa using a one node, one process per node configuration
    • …
    corecore